Semantic Annotation of Verbs for the Tatar Corpus
نویسندگان
چکیده
This paper discusses the problem of developing the metalanguage for linguistic applications and introduces a tag set for the semantic annotation of verbs for the Tatar National Corpus. At present, there are no generally accepted standards for the development of corpus semantic annotation. In many cases, it is made by individual researchers or teams for one or another research project, and characteristics of tag sets used in thesauri and electronic corpora differ in many respects. Using available semantic classifications of vocabulary for different languages and relying upon data from Tatar lexicons, we created a model of the semantic system of Tatar verbs and divided them into semantic classes (3,200 words). We distinguished semantic tags of two types: constructional (categorial) tags, independent of semantic classes of verbs, and semantic (thematic) tags, determining semantic classes of verbs. For the separation of these classes we used the hierarchical and overlapping classifications, so that the same verb may belong to more than one class. The approach is based on the data from explanatory dictionaries of the Tatar language, bilingual Russian-Tatar dictionaries and the system of semantic annotation of the Russian National Corpus. In the current version of our semantic annotation, we use 3 categorial and 59 thematic tags.
منابع مشابه
Verbs in Applied Linguistics Research Article Introductions: Semantic and syntactic analysis
This study aims to investigate the semantic and syntactic features of verbs used in the introduction section of Applied Linguistics research articles published in Iranian and international journals. A corpus of 20 research article introductions (10 from each journal) was used. The corpus was analysed for the syntactic features (tense, aspect and voice) and semantic meaning of verbs. The finding...
متن کاملVerbs in Applied Linguistics Research Article Introductions: Semantic and syntactic analysis
This study aims to investigate the semantic and syntactic features of verbs used in the introduction section of Applied Linguistics research articles published in Iranian and international journals. A corpus of 20 research article introductions (10 from each journal) was used. The corpus was analysed for the syntactic features (tense, aspect and voice) and semantic meaning of verbs. The finding...
متن کاملConcordance-Based Data-Driven Learning Activities and Learning English Phrasal Verbs in EFL Classrooms
In spite of the highly beneficial applications of corpus linguistics in language pedagogy, it has not found its way into mainstream EFL. The major reasons seem to be the teachers’ lack of training and the unavailability of resources, especially computers in language classes. Phrasal verbs have been shown to be a problematic area of learning English as a foreign language due to their semantic op...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملExtending Fine-Grained Semantic Relation Classification to Presupposition Relations between Verbs
In contrast to typical semantic relations between verbs, such as antonymy, synonymy or hyponymy, presupposition is a lexical relation that is not very well covered in existing lexical resources. It is also understudied in the field of corpus-based methods of learning semantic relations. But presupposition is very important for the quality of automatic semantic and discourse analysis tasks. In t...
متن کامل